Goto

Collaborating Authors

 exponential decay property




Scalable spectral representations for multi-agent reinforcement learning in network MDPs

Ren, Zhaolin, Zhang, Runyu, Dai, Bo, Li, Na

arXiv.org Artificial Intelligence

Multi-agent network systems have found applications in various societal infrastructures, such as power systems, traffic networks, and smart cities [McArthur et al., 2007, Burmeister et al., 1997, Roscia et al., 2013]. One particularly important class of such problems is the cooperative multi-agent network MDP setting, where agents are embedded in a graph, and each agent has its own local state [Qu et al., 2020b]. In network MDPs, the local state transition probabilities and rewards only depend on the states and actions of the agent's direct neighbors in the graph. Such a property has been observed in a great variety of cooperative network control problems, ranging from thermal control of multizone buildings [Zhang et al., 2016], wireless access control [Zocca, 2019] to phase synchronization in electrical grids [Blaabjerg et al., 2006], where agents typically only need to act and learn based on information within a local neighborhood due to constraints on the information and communication infrastructure.


Distributed Policy Gradient for Linear Quadratic Networked Control with Limited Communication Range

Yan, Yuzi, Shen, Yuan

arXiv.org Artificial Intelligence

This paper proposes a scalable distributed policy gradient method and proves its convergence to near-optimal solution in multi-agent linear quadratic networked systems. The agents engage within a specified network under local communication constraints, implying that each agent can only exchange information with a limited number of neighboring agents. On the underlying graph of the network, each agent implements its control input depending on its nearby neighbors' states in the linear quadratic control setting. We show that it is possible to approximate the exact gradient only using local information. Compared with the centralized optimal controller, the performance gap decreases to zero exponentially as the communication and control ranges increase. We also demonstrate how increasing the communication range enhances system stability in the gradient descent process, thereby elucidating a critical trade-off. The simulation results verify our theoretical findings.


Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward

Qu, Guannan, Lin, Yiheng, Wierman, Adam, Li, Na

arXiv.org Artificial Intelligence

It has long been recognized that multi-agent reinforcement learning (MARL) faces significant scalability issues due to the fact that the size of the state and action spaces are exponentially large in the number of agents. In this paper, we identify a rich class of networked MARL problems where the model exhibits a local dependence structure that allows it to be solved in a scalable manner. Specifically, we propose a Scalable Actor-Critic (SAC) method that can learn a near optimal localized policy for optimizing the average reward with complexity scaling with the state-action space size of local neighborhoods, as opposed to the entire network. Our result centers around identifying and exploiting an exponential decay property that ensures the effect of agents on each other decays exponentially fast in their graph distance.


Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems

Qu, Guannan, Wierman, Adam, Li, Na

arXiv.org Artificial Intelligence

We study reinforcement learning (RL) in a setting with a network of agents whose states and actions interact in a local manner where the objective is to find localized policies such that the (discounted) global reward is maximized. A fundamental challenge in this setting is that the state-action space size scales exponentially in the number of agents, rendering the problem intractable for large networks. In this paper, we propose a Scalable Actor-Critic (SAC) framework that exploits the network structure and finds a localized policy that is a $O(\rho^\kappa)$-approximation of a stationary point of the objective for some $\rho\in(0,1)$, with complexity that scales with the local state-action space size of the largest $\kappa$-hop neighborhood of the network.